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A method to reconstruct the energy landscape of smaU peptides is presented with reference to 
a 2d off-lattice model. The starting point is a statistical analysis of the configurational distances 
between generic minima and directly connected pairs (DCP). As the mutual distance of DCP is 
typically much smaller than that of generic pairs, a metric criterion can be established to identify 
the great majority of DCP. Advantages and limits of this approach are thoroughly analyzed for 
three different heteropolymeric chains. A funnel-like structure of the energy landscape is found in 
all of the three cases, but the escape rates clearly reveal that the native configuration is more easily 
accessible (and is significantly more stable) for the sequence that is expected to behave as a real 
protein. 
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I. INTRODUCTION 

Several states of matter are characterized by a rich en- 
ergy landscape (EL), which, in turn, hints at peculiar 
structural and dynamical features. Supercooled liquids, 
glasses, atomic clusters and biomolecules (ij are typical 
examples of systems whose complex thermodynamic be- 
havior can be traced back to the intricate topological 
properties of the EL. The pioneering work by Stillingcr 
and Weber on "inherent" structures of liquids ,2| revealed 
the importance of investigating the stationary points of 
the EL for characterizing their dynamical and thermo- 
dynamic properties. Similar approaches have been pro- 
posed and successfully applied to the identification of the 
structural-arrest temperature in glasses Q and super- 
cooled liquids P|. 

More recently, this kind of analysis has been extended 
to the study of protein models [l,|a,l3- They suggest that 
also the folding process of a protein towards its native 
configuration (NC) depends on the structure of its EL. 
This has been found to possess a funnel-like shape: the 
NC is located inside the so-called native valley (NV) at 
the bottom of the funnel Q . 

Below the folding temperature, the evolution from a 
coil state to the NC is determined by the propensity 
of the protein to enter the relatively small fraction of 
states composing the NV without having to visit the en- 
tire phase-space. In a statistical sense, the folding process 
can be viewed as a weighted sampling mechanism which 
favours specific intermediate configurations. They corre- 
spond to assembling the building blocks which eventually 
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constitute the NC. Well above the folding temperature, 
no marked difference exists among the various states and 
the protein spends most of the time jumping between 
different random coil configurations. 

In the absence of external forces, only thermal fluctu- 
ations can drive the protein dynamics through different 
regions of the EL. In particular, below the folding tem- 
perature, the protein is expected to evolve mainly inside 
the NV. Nonetheless, large deviations from the NC can- 
not be avoided, but they are both rare and very short 
lived. This scenario was confirmed by simulations per- 
formed in a 2d off-lattice modelQ- A more detailed 
analysis of the model |0 revealed that the protein dy- 
namics can viewed as a sequence of jumps between pairs 
of minima separated by one saddle, that we call directly 
connected pairs (DCP). Each jump is a thermally acti- 
vated process: the protein performs random oscillations 
in the basin of a local minimum until a sufficiently large 
thermal fluctuation allows it to overtake the energy bar- 
rier separating the minimum from a neighbouring one. 
In a high-dimensional space, the transition rate can be 
computed as the product of the Arrhenius factor times 
an entropic weight which depends on the DCP and on the 
saddle curvatures (see Section llV)l . In other words, the 
relevant information about the protein dynamics can be 
obtained from the knowledge of the DCP and of the cor- 
responding saddles. However, the reconstruction of the 
EL is a very difficult task to be accomplished in prac- 
tice. Indeed, the identification of DCP by a systematic 
exploration of the entire set of the N identified minima 
requires to exploring N'^ pairs, which is already on the 
order of 10^ for the partial database generated (see the 
Appendix for a description of the algorithm) in the rela- 
tively small polypeptidic chains investigated in this paper 
(see section^). It is therefore, crucial to develop effective 
strategies for identifying DCP within the set of all, a pri- 
ori possible, candidates. This is the main issue addressed 
in this paper. 
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It is reasonable to conjecture that the distance separat- 
ing DCP is typically smaller than that between generic 
pairs of minima. It is therefore tempting to restrict the 
analysis to those pairs whose mutual distance is smaller 
than some prescribed threshold. However, whether this 
approach effectively works may depend on several fac- 
tors one of which is the adopted definition of distance. 
For this reason, in section ITTll we introduce and compare 
different conformational distances. It turns out that the 
bond-angle distance, defined by the absolute-value norm 
is the one that makes the implementation of such a cri- 
terion most accurate in the identification of DCP. The 
price one has to pay is that, unavoidably, all DCP char- 
acterized by a distance larger than the given threshold 
are missed. The detailed analysis of the EL dscribed in 
Section IIVI indicates that the average distance between 
DCP is larger for minima that are closer to the NC. Ac- 
cordingly, the computational advantage guaranteed by 
the choice of a relatively small threshold could be frus- 
trated by the loss of important connections located in 
the NV. For this reason, we argue that the distance cri- 
terion has to be complemented by a systematic search 
of all DCP involving minima in the NV, a much more 
accessible task, given the limited number of such minima 
(for an operative definition of the NV see section ITVji . 



II. THE MODEL 

The model studied in this paper is a modified version 
of the 2d off-lattice model introduced by Stillinger et al. 
[m and already investigated in Ref. % It consists of a 
chain of L point-like monomers mimicking the residues 
of a polypeptidic chain. For the sake of simplicity, only 
two types of residues are considered: hydrophobic (H) 
and polar (P) ones. Any chain is unambiguously identi- 
fied by a sequence of binary variables {^i} (i = 1, . . . , L), 
where = ±1 corresponds to H and P residues, respec- 
tively. The intramolecular potential is composed of three 
terms: a stiff nearest- neighbour harmonic potential, Vi, 
intended to maintain the bond distance almost constant, 
a three-body interaction V2 , which accounts for the bend- 
ing energy and a long-range Lennard-Jones interaction, 
V'i, acting on all pairs i, j such that |i — j| > 1 
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Here, r^j is the distance between the i-th and the j-th 
monomer and 9i is the bond angle at the i-th monomer. 
The parameters a = 20 and — 1 (both expressed in 
adimensional arbitrary units) fix the strength of the har- 
monic force and the equilibrium distance between subse- 
quent monomers (which, in real proteins, is on the order 
of a few A). The value of a is chosen to ensure a value 



for Vi much larger than the other terms of potential 
in order to reproduce the stiffness of the protein back- 
bone. V3 is the only term of the potential energy which 
depends on the nature of the monomers: the parameters 
Ci,j = I (1 + + + ^^i^j ) ^^'C chosen in such a way that 
the interaction is attractive if both residues are either hy- 
drophobic or polar {cij = 1 and 1/2, respectively), while 
it is repulsive if the residues belong to different species 
(C, - -1/2). 

Accordingly, the Hamiltonian of the system reads 

i=l i=l 
L-1 L-2 L 

+ E^2(0.) + E E ^3(r.„c.,^,) (2) 

1=2 1=1 j=i+2 

where, for the sake of simplicity, all monomers are as- 
sumed to have the same unitary mass and the momenta 
are defined as {px,i,Py,i) = iii,yi)- Despite its simplic- 
ity, this toy-model of a heteropolymer does reproduce 
the expected properties of polypeptidic chains and is thus 
very useful for testing dynamical and statistical indica- 
tors. For instance, accurate Monte-Carlo simulations, 
performed by employing innovative schemes [T^ . have 
revealed that, in analogy with real proteins, only a few 
sequences systematically fold onto the same native struc- 
ture: this is why they have been named "good folders" 
[T3L . Such studies have been confirmed and comple- 
mented by direct molecular dynamics simulations 

In this paper we limit ourselves to investigating the 
three following sequences of twenty monomers, 

• [SO] a homopolymer composed of 20 H residues; 

• [S1] = [HHHP HHHP HHHP PHHP PHHH] a se- 
quence that has been identified as a good folder in 

0; 

• [S2] = [PPPH HPHH HHHH HHHP HHPH] a ran- 
domly generated sequence, that has been identified 
as a bad folder in 

These sequences have been chosen because they repre- 
sent the three classes of different folding behaviors ob- 
served in this model. The main thermodynamic fea- 
tures of all of them can be summarized with reference to 
three different transition temperatures [l3|. Decreasing 
the temperature from high values, one first encounters 
the temperature Tg below which the sequence is typi- 
cally found in a collapsed configuration, rather than in a 
random-coil one jl^]. Then, one finds the so-called fold- 
ing temperature Tf below which the heteropolymer stays 
predominantly in the native valley. Finally, at even lower 
temperatures one finds Tg: this is the glass-transition 
temperature, below which a structural arrest of the sys- 
tem occurs. 

In the following we aim at a deeper understanding of 
the folding process by investigating directly the EL struc- 
ture of all these sequences. 
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III. ANALYSIS OF THE ENERGY LANDSCAPE 

As pointed out in the introduction, the main prob- 
lem for reconstructing the EL amounts to finding DCP 
and the corresponding saddles of the potential energy 
V = Vi + V2 + V3 (see Eq.(|21l). An exhaustive search 
of DCP among all pairs of minima rapidly becomes un- 
feasible with increasing the chain length L, due to the 
exponential increase of the number of minima with L 
itself f^. Since the total number of DCP is a rather 
small fraction of all possible pairs (see, e.g., the Table 
in the Appendix), it would be very helpful finding an 
effective criterion to restrict the search of potentially di- 
rectly connected minima. A priori, the distance seems to 
be the right indicator to discriminate between connected 
and not-contiguous pairs of minima. In this section we 
investigate several definitions of distance with the goal of 
identifying the most appropriate one to identify DCP. 

As a first candidate, we introduce the generalized an- 
gular distance 5'f \Ci,C2) between configurations Ci and 
C2, 



SO, SI and S2). This confirms the naive idea that DCP 
are typically much closer than randomly chosen pairs of 



mmima. 



4'^(Cl,C2) = 



1 



L-l 



L-2 



^|0(z;Ci)-0(z;C2)|« 



(3) 

where 0{i,C) is the i-th bond angle of the configuration 
C. Notice that this angular distance is much sensitive 
to local flucutuations along the chain. A generalized dis- 
tance which depends more on the global than on the local 
structure of a configurations is 



L 

' ■i>i + l 



|r(*,j;Ci)-r(i,j;C2)r 



(4) 

where r(i, j; C) is the intra-bead distance between ith and 
jth monomer of the configuration C. This distance is re- 
lated to the x-indicator, previously employed in the anal- 
ysis of the folding dynamics in on- and off-lattice models 
of heteropolymers in 2d and 3d For g = 1, 2, and 

+00, both definitions of generalized distances reduce to 
the standard absolute-value, Euclidean and maximum 
norms, respectively. Such distances have been computed 
for all the pairs of minima in the databases of the se- 
quences SO, SI and S2. The algorithm used to generate 
the databases is described in the Appendix. We want to 
point out that any numerical procedure, including ours, 
cannot guarantee the identification of all minima and sad- 
dles in the EL. Nonetheless, we have independently ver- 
ified that the algorithm allows obtaining at least a very 
accurate description of the native valley. 

Then, we have computed the probability densities of 
the generalized distances between generic (P{S)) and di- 
rectly connected (Pjy^y) pairs of minima. In all cases, P 
has a bell shape with a maximum close to 1, while Pc is 
sharply peaked at much smaller values (see, e.g., Fig.^ 




FIG. 1; Probability density of angular distances for all 
the pairs of minima, Pa (dashed line), and for DCP, Pc (solid 
line), for the sequences SO (a) and SI (b) and S2 (c). 



A qualitatively similar difference between P and Pc is 
observed also for different choices of q as well as for the 
global distance ^'^^ . In order to identify the most appro- 
priate value of (7, it is convenient to introduce the inte- 
grated fraction R[S) of pairs of minima whose distance is 
smaller than b 



where P{6^q^) and Pc[S^q^) are plotted for the sequences 
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and, equivalently, 

Rc{S)= I Pc{x)dx, (6) 

relative to DCP. Upon considering 5 as a dummy vari- 
able, it is possible to plot Rc versus R. A fast conver- 
gence of Rc to 1 means that almost all DCP can be al- 
ready identified by limiting the search to relatively close 
pairs of minima. The data plotted in Fig[2K for SI and 
(7 = 1 indeed reveal such a fast growth of R^ that almost 
90% of the DCP can be obtained by investigating only 
1% of all pairs both by considering the angular S^g'^ and 

the global si'^'^ distance. The curves in FigI2h show that 
significantly worse performances are obtained when the 
maximum norm {q — oo) is used to classify the pairs of 
minima. In order to clarify the role of the parameter q, in 
Figs l3l4l we have plotted Rc versus R for different values 
of this parameter. There we see that, upon decreasing 
q from oo down to 1, Rc exhibits an increasingly fast 
saturation, while the opposite is observed when q is fur- 
ther decreased below 1. The bad performance observed 
at high values has a quite intuitive explanation: in that 
limit, the norm reduces to the maximum norm and the 
slow growth of Rc tells us that distances between DCP 
are not uniformly small along all directions: DCP may 
significantly differ along specific directions in spite of be- 
ing "on the average" much closer than generic pairs of 
minima. The relatively bad performance observed for 
g — > has a complementary explanation. In that limit, 
the average distance is strongly biased by small differ- 
ences 59 or <5r, whose occasional occurrence may induce 
to classify as "close" , configurations that are significantly 
different instead. In all configurations we have investi- 
gated, it turns out that g ~ 1 is the best compromise 
between the above two effects. Having established that 
g = 1 is the best choice, from now on, we limit ourselves 
to considering that value and drop the superscript (1) in 
the definition of the distance. 

In practice, since it is eventually necessary to identify 
a threshold distance it is convenient to look also at 
the dependence of Rc on 5e and to introduce 

From Fig. [SI we see that 6q = 0.5 is a good choice, since 
p{0.5) ~ 0(10-2), while i?c(0.5) - 99%. Even reducing 
the threshold value to 6g — 0.2 a large fraction of DCP 
are still recovered (i?c(0.2) - 90%). 

These results indicate that if one restricts the search of 
DCP to the set of minima whose distance is smaller than 
a prescribed threshold 6g, one can reduce significantly 
the most time-consuming part of the systematic search 
algorithm, which amounts to testing whether a generic 
pair of minima is separated by a single saddle. 

The drawback is that for any choice of Sg, those DCP 
whose mutual distance is anomalously large are going to 
be missed. In the next Section we analyse the EL with 
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FIG. 2: Integrated probability density Rc versus Ra for the 
angular Sq (solid lines) and global 5r (dashed lines) distances. 
The data refer to the the sequence SI and to the absolute 
value (a) and maximum (b) norm. 
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FIG. 3: Integrated probability density Rc versus Ra for the 
angular distance for various norms for the sequence SO. Con- 
tiuous lines correspond to q < 1 (q = 0.01, q = 0.2, q = 0.5, 
and q = 0.75, going from the thinnest to the tickest line, re- 
spectively). The dashed line corresponds to g = 1 (norm of 
the absolute value). The dotted lines correspond g > 1 (g = 2, 
q = 5, and g = co, going from the thickest to the thinnest 
line, respectively). 



the main goal of concluding whether such a component 
is of some relevance in the overall reconstruction of the 
EL. 
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FIG. 4: Integrated probability density Rc versus Ra for the 
angular distance for various norms for the sequences SI (a) 
and S2 (b). The notations are the same as in Fig|3| 



IV. A CLOSER INSPECTION OF THE ENERGY 
LANDSCAPE 

A faithful reconstruction of the EL requires a suffi- 
ciently large database of stationary points, i.e. minima 
and saddles. The procedure described in the Appendix is 
quite reliable in this respect, but it is also computation- 
ally very time-consuming as already stated. In the pre- 
vious section we have seen that a large fraction of DCP 
can be obtained by adopting a suitable metric criterion. 
However, it is not a priori clear whether the long-distance 
tail of Pc is qualitatively irrelevant too. 

In order to shed some light on this question we have 
divided the minima into "shells" : the n-th shell is de- 
fined as the collection of all minima which are separated 
from the NC by at least n saddles. The identification 
of the minima belonging to each shell can be achieved 
recursively: 

• the 0-th shell coincides with NC; 

• a minimum C\ , directly connected to a minimimum 
C2 of the i-th shell, is identified as part of the i-f 1-st 
shell if C2 does not belong to a shell of order j < i. 

In Fig.lHl we report the average value 5g of the distance 
between DCP belonging to consecutive shells for all of 
the three sequences. This figure indicates that the inter- 
minimum distance grows in the vicinity of the NC. The 
average distance {6g) between DCP lying inside the same 
shell exhibits the same behaviour. Finally, this rarefied 
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FIG. 5: The ratio p between DCP and all pairs up to distance 
5e (continuous line) compared with i?c(<5e) (dashed line) for 
the three analyzed sequences, SO (a), SI (b) and S2 (c). 



density of minima in the vicinity of the NC is confirmed 
also by plotting the mutual distance versus the actual dis- 
tance from the NC. In this case, in order to smoothen the 
wild pair-to-pair fluctuations, a coarse-graining has been 
performed by averaging over bins of widht 0.01 along the 
Se axis (see Fig. [T)). 

Altogether, the significative increase of the average dis- 
tance close to the NV indicates that some of the DCP 
that are missed by the metric criterion discussed in pre- 
vious section lie in the most relevant region for the char- 
acterization of the folding/unfolding processes. However, 
considering the limited number of minima in the NV, 
such a difficulty can be easily overcome by complement- 
ing the overall application of the metric criterion with an 
extensive comparison of such minima with all configura- 
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FIG. 6: Average connection length between the minima of the 
i-th and i — 1-th shell for the three sequences SO (triangles), 
SI (circles) and S2 (filled diamonds). 




5.(I,NC) 



FIG. 7: The average distance of minima from their connected 
set versus their distance from the NC for the sequence SO 
(triangles), SI (circles) and S2 (filled diamonds). The data 
are averaged also over intervals of length / = 0.01 along the 
horizontal axis. 



tions in the database: the additional cost in terms of the 
computing time is indeed a minor one. 

The increased distance between neighbouring minima 
hints at possibly deeper valleys and is, in turn, suggestive 
of the presence of a funnel in the EL, a structure that is 
typically expected to appear in true proteins [3]- How- 
ever, we find this scenario in all of the three sequences 
analyzed in this paper, including the homopolymer SO 
which cannot be certainly considered a reasonable model 
for a protein. In order to further clarify this point, we 
have computed the escape rates from the single valleys. 
Given any two directly connected minima Ci and C2 char- 
acterized by the potential energies Vi and V2 (Vi < V2), 
the escape rate from the minimum C = {Ci,C2} through 
the saddle Cg (with potential energy Vg) is given by 



n 



1=1 



717 



-1 (j) 



exp 



Wc 
'kuT 



(7) 



This formula, proposed by Langer , is obtained 

by considering the harmonic approximation of the po- 
tential in the vicinity of Ci, C2, and Cs- The s are 



the L' = 2L — 3 non zero frequencies of the minimum C 

(i - 



A^f' , where A^''' is the «-th negative eigenvalue 
of the Hessian of the potential energy V) . Analogously, 

uj\^ s are the L — 1 non zero frequencies of the saddle 
(those corresponding to the stable directions), while u)\\ 
is associated with the only expanding direction. Finally, 
7 is the dissipation rate 0|, while the Arrhenius ex- 
ponential factor depends on the height of the barrier, 
Wc = Vs — Vi^2, normalized to the reduced temperature 
KbT, Kb being the Boltzmann constant. 

The above expression has been shown to reproduce 
reasonably well the numerically obtained escape rates 
for heteropolymers in 2d at moderate temperatures for 
T <« Tg T(j|. Accordingly, in that regime, both the fold- 
ing and the unfolding dynamics towards and from the NC 
are driven by thermal activation processes, which deter- 
mine the transitions between DCP. Small F-values sug- 
gest that the heteropolymer may be trapped into some 
local valley of the EL, far from the NC. Eq. iQ shows ex- 
plicitly that in high-dimensional spaces, the escape rate 
does not simply depend on the energy barriers but also 
on entropic factors, which depend on geometrical features 
of basins of attractions of the stationary configurations 
in the EL. 

In Fig.lHlwe have plotted the rates Fg as a function of 6e 
for the three sequences aX T = Tf and 7 = 1 (since past 
simulations indicate that the effective value of 7 is the 
same at least in the whole NV, there is no need to know it 
when a comparative analysis is being carried out). There 
we notice a striking difference between SO and SI at large 
distances: actually the escape rates of SO are two orders 
of magnitude smaller than those of SI. Moreover, Fca ~ 
0(1) almost in the whole range of distances 5g for SI, 
indicating that no trapping is expected in the shallower 
minima, while it exhibits an almost exponential decrease 
for SO. An intermediate situation is observed for S2 at 
large distances. As a result, we see that a true funnel- 
like structure is markedly present only in the EL of the 
sequence SI, that was indeed already identified as a good 
folder by looking at different indicators 0, 0| . 



V. CONCLUSIONS AND PERSPECTIVES 

In this paper we have analyzed the structure of the 
energy landscape (EL) of a 2d off-lattice model of a 
polypeptidic chain and found that the relative closeness 
between neighbouring minima can be exploited to imple- 
ment an effective algorithm to identify directly connected 
minima. In order to put the analysis on firm quantita- 
tive grounds, we have tested several definitions of dis- 
tance between configurations, finding that the best per- 
formances are obtained for the angular distance 5^^' (see 

Eq. (PJ). In fact, the 5^g^ distances corresponding to DCP 
are more sharply concentrated at small values than for 
all other definitions of distances. In particular, we have 
found that restricting the systematic search to all pairs 
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FIG. 8: Escape rates l|7J versus the angular distance Se for 
each DCP for sequence SO (a) and SI (b) and S2 (c) at a 
temperature T = Tf (namely, T/ = 0.044 for SO and S2, 
Tf = 0.061 for SI). The filled circles refer to the transition 
rates Fci , while the empty circles to Fca • 



of minima whose distance is smaller than S, 



(1)* 



0.5, 



one can recover almost 99% of all DCP. The drawback of 
this approach is that a tiny fraction of DCP is unavoid- 
ably missed, but the consequences are not a priori clear. 
Since our analysis of the native valley has shown that 
minima are more rarefied in the vicinity of the native 
configuration (see Fig. [TJ , we conclude that it is wise to 
complement the above metric criterion with an extensive 
search limited to the minima of the native valley. It is 
now important to verify to what extent such a scenario 
extends to more realistic 3D systems, where the imple- 
mentation of effective algorithms to reconstruct the EL 
is even more crucial than in 2D. Furthermore, important 
hints about the true relevance of missing links in a con- 
nectivity graph will come after imposing a dynamics on 
the graph itself by adding the activation rates F relative 



to the transitions between directly connected minima. 
By comparing the resulting evolution with that of the 
original system one can in particular determine the min- 
imal fraction of DCP that is necessary to identify for a 
meaningful reconstruction of the folding process. 

Finally, in order to test how the observed rarefied den- 
sity of minima in the vicinity of the NC is connected to 
the presence of a true funnel-like structure, we have com- 
puted the activation rate F inside the NV. It turns out 
that moving away from the NC, while in the homopoly- 
mer F decreases very rapidly, it stays almost constant in 
the sequence SI, previously identified as a good fooldcr 
by other means. This is a clear indication that the ho- 
mopolymer can be trapped in several minima far from 
the minimal energy state, while an accessible native val- 
ley does exist for SI. 
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Appendix 

The database containing the minima of a model se- 
quence can be used to determine the first-order saddles 
directly connecting neighboring minima. The method we 
propose to accomplish this goal is based on two different 
algorithms to solve the following two problems: 

• Given two minima Ci and C2 and two configurations 
Vi and V2 belonging to their basins of attraction, 
one wants to determine two configurations Qi and 
Q2 on the segment 7^i'p2 arbitrarily close to the 
ridge dividing the two basins and lying on its op- 
posite sides; 

• given Qi and Q2, as defined above, one wants to 
apply an iterative procedure, leading the two points 
to converge on the saddle. 

The procedure is here described in more details: 

1. given Qi = Vi and Q2 = 7^2, an intermediate con- 
figuration C is defined, by setting its bond angles 
equal to the average of the corresponding angles of 
Qi and Q2; 

2. a steepest descent procedure is applied to C until a 
minimum Cm is reached; 
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3. (a) if Cm coincides with Ci (C2), we replace Qi 

(Q2) with C and repeat the step (2) until 
the euclidean distance dsiQi, Q2) between Qi 
and Q2 becomes smaller than a given thresh- 
old S; 

(b) if Cm differs from both Ci and C2, we assume 
that the basins of attraction of the two minima 
are not directly connected (i.e., Ci and C2 are 
not neighbors) and Cm is added to the minima 
database, provided it is not already included; 

(c) if dsiQi, Q2) < 5, we conclude they are close 
enough to the ridge (i.e. the stable manifold 
separating the basins of attraction of Ci and 
C2) and pass to item (4); 

4. we let Qx and Q2 evolve in time according to a 
steepest descent relaxation until d_B(Qi(i), 22(0) 
becomes larger than 5 and go back to (1), identify- 
ing Vi with Qi{t) and V2 with Q2(i); 

5. when, during the steepest descent, the gradient of 
the potential both in Qi and Q2 becomes smaller 
than another given threshold, we assume Qi and 
Q2 to be close enough to a first order saddle and 
refine the configuration by means of a Newton's 
algorithm. 

It must be stressed that this algorithm not only allows 
to find first order saddles but also enriches the number 
of known minima. The saddle-searching strategy here 
proposed can then be viewed as an "all purpose" method 
suitable for a complete exploration of all the features of 
the EL relevant for protein dynamics. 

The data reported in this paper were obtained by 
applying the algorithm to an initial database of min- 
ima determined by the method described in Ref. [loj . 
It amounts to performing a high-temperature (typically 
above Tf) Langevin dynamics, when the system is ex- 
pected to visit a large portion of the accessible phase 
space. The resulting trajectory is then sampled to pin- 
point a series of configurations, which are afterwards re- 
laxed according to a steepest descent dynamics and fi- 
nally refined by means of a standard Newton's method. 



This procedure was used for identifying a small database 
of 50-100 collapsed states for each of the three sequences 
described in section ITU 

All possible pairs in these initial minima databases 
were then searched for DCP by means of our saddle- 
finding algorithm. The new minima found during each 
run of the algorithm on all possible pairs were stored in 
the database to be investigated in successive runs. Ac- 
tually the number of pairs of minima grows much faster 
than the number of pairs investigated, thus making im- 
possible a complete analysis. The number of minima and 
saddles that were identified after three runs is reported 
in Table n 

In order to perform a complete search of all DCP at 
least in a restricted set of minima, we have selected all 
configurations of energy lower than Vj = Vq + LTf. In 
this restricted database all pairs of minima characterized 
by an angular distance smaller than 0.5 where investi- 
gated. The total number of saddles found by this proce- 
dure is reported in Table Q] 

TABLE I: Number of minima in the whole database and in 
the first and second shell. Number of investigated pairs and 
number of DCP found. Data have been reported for all the 
three examined sequences. 





SO 


SI 


S2 


Total number of 
minima 


2.3 X 10^ 


7.2 X lO"* 


6.4 X 10" 


Number of minima 
in the 1st shell 


64 


50 


49 


Number of minima 
in the 2nd shell 


1,181 


465 


348 


Number of 
minima below Tf 


66,470 


5,883 


8,670 


Number of 
investigated pairs 


9.1 X 10*^ 


0.94 X lO'' 


1.3 X 10^ 


Number of 
connected pairs 


84,990 


10,470 


14,356 
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